14 research outputs found

    Generating Natural Questions About an Image

    Full text link
    There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.Comment: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic

    The Outcomes of Accidental Ingestion of Hand Sanitizer

    Get PDF
    Background: children are exposed to numerous chemicals, such as hand sanitizers and their associated poisoning. Awareness of such poisoning symptoms and their management is critical for healthcare providers, as it can induce dangerous conditions. This study evaluated the clinical and paraclinical parameters of children who ingested hand sanitizers.Methods: In the first 6 months of 2020, the cases of children who accidentally ingested hand sanitizer and were referred to Akbar Hospital in Mashhad City, Iran (Referral pediatric poisoning center) were evaluated in this cross-sectional study. The relevant medical information and laboratory parameters were recorded.Results: In total, 20 patients were evaluated; of whom, 80% were male. The Mean±SD age of the study subjects was 4.9±4.2 years (2-15y). The ingestion was accidental in the study participants. The amount of ingestion approximately equaled a sip (3-7cc). All clinical and laboratory parameters were normal according to their age. Only 1(5%) case presented metabolic acidosis. No study subject manifested hypoglycemia or the loss of consciousness. Thus, they only received supportive care and were observed for ≥12 hours.Conclusion: Based on the present study results and absence of symptoms in the explored cases of hand sanitizer ingestion, the suitability of hand sanitizer solutions. Therefore, the formulation of these products should be carefully evaluated

    A Survey of Current Datasets for Vision and Language Research

    Full text link
    Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this wor

    CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures

    Get PDF
    Abstract Learning commonsense causal and temporal relation between events is one of the major steps towards deeper language understanding. This is even more crucial for understanding stories and script learning. A prerequisite for learning scripts is a semantic framework which enables capturing rich event structures. In this paper we introduce a novel semantic annotation framework, called Causal and Temporal Relation Scheme (CaTeRS), which is unique in simultaneously capturing a comprehensive set of temporal and causal relations between events. By annotating a total of 1,600 sentences in the context of 320 five-sentence short stories sampled from ROCStories corpus, we demonstrate that these stories are indeed full of causal and temporal relations. Furthermore, we show that the CaTeRS annotation scheme enables high inter-annotator agreement for broad-coverage event entity annotation and moderate agreement on semantic link annotation

    LSDSem 2017 Shared Task: The Story Cloze Test

    Get PDF

    Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

    Get PDF
    Abstract The popularity of image sharing on social media reflects the important role visual context plays in everyday conversation. In this paper, we present a novel task, ImageGrounded Conversations (IGC), in which natural-sounding conversations are generated about shared photographic images. We investigate this task using training data derived from image-grounded conversations on social media and introduce a new dataset of crowd-sourced conversations for benchmarking progress. Experiments using deep neural network models trained on social media data show that the combination of visual and textual context can enhance the quality of generated conversational turns. In human evaluation, a gap between human performance and that of both neural and retrieval architectures suggests that IGC presents an interesting challenge for vision and language research

    From Event to Story Understanding

    No full text
    Building systems that have natural language understanding capabilities has been one of the oldest and the most challenging pursuits in AI. In this thesis, we present our research on modeling language in terms of `events' and how they interact with each other in time, mainly in the domain of stories. Deep language understanding, which enables inference and commonsense reasoning, requires systems that have large amounts of knowledge which would enable them to connect surface language to the concepts of the world. A part of our work concerns developing approaches for learning semantically rich knowledge bases on events. First, we present an approach to automatically acquire conceptual knowledge about events in the form of inference rules, which can enable commonsense reasoning. We show that the acquired knowledge is precise and informative which can be employed in different NLP tasks. Learning stereotypical structure of related events, in the form of narrative structures or scripts, has been one of the major goals in AI. The research on narrative understanding has been hindered by the lack of a proper evaluation framework. We address this problem by introducing a new framework for evaluating story understanding and script learning: the 'Story Cloze Test (SCT)’. In this test, the system is posed with a short four-sentence narrative context along with two alternative endings to the story, and is tasked with choosing the right ending. Along with the SCT, We have worked on developing the ROCStories corpus of about 100K commonsense short stories, which enables building models for story understanding and story generation. We present various models and baselines for tackling the task of SCT and show that human can perform with an accuracy of 100%. One prerequisite for understanding and proper modeling of events and their interactions is to develop a comprehensive semantic framework for representing their variety of relations. We introduce `Causal and Temporal Relation Scheme (CaTeRS)' which is a rich semantic representation for event structures, with an emphasis on the domain of stories. The impact of the SCT and the ROCStories project goes beyond this thesis, where numerous teams and individuals across academia and industry have been using the evaluation framework and the dataset for a variety of purposes. We hope that the methods and the resources presented in this thesis will spur further research on building systems that can effectively model eventful context, understand, and generate logically-sound stories
    corecore